Move Job objects to postgres #368

msm-code · 2024-02-04T13:56:32Z

Your checklist for this pull request

I've read the contributing guideline.
I've tested my changes by building and running mquery, and testing changed functionality (if applicable)
I've added automated tests for my change (if applicable, optional)
[n/a] I've updated documentation to reflect my change (if applicable)

What is the current behaviour?
Job objects are kept in redis

What is the new behaviour?
Job objects are kept in postgres.

I'm consciously trying to keep everything very self-contained and don't want to change a single line outside of the Database abstraction. It's very useful to have this layer when turning around the whole storage layer.

Related to #74

docs/redis.md

nazywam · 2024-02-05T12:07:40Z

src/app.py

@@ -492,11 +492,11 @@ def matches(

 @app.get(
    "/api/job/{job_id}",
-    response_model=JobSchema,
+    response_model=Job,


We're fine with returning the whole row from db, right? I.e. there's no sensitive stuff there that users shouldn't have access to

(Because if that was the case, we could define a schema for reading the job object)

Hmm, for postgres migration I had to define internal_id for the Job object. It's not really sensitive, but by design it's not necessary for anything. I'll think about a Read object

nazywam · 2024-02-05T12:09:31Z

src/db.py

@@ -67,55 +68,36 @@ def __schedule(self, agent: str, task: Any, *args: Any) -> None:

    def get_job_ids(self) -> List[JobId]:
        """Gets IDs of all jobs in the database"""
-        return [key[4:] for key in self.redis.keys("job:*")]
+        with Session(self.engine) as session:


We're probably gonna be writing this a lot, should we maybe create a helper decorator like with self.get_session()?

I actually think it'll be useful to even have something like

self.execute(update(Job).where(...))

to handle commits etc automaticaly (this is a very common pattern). I'll do a separate refactor PR in a while

nazywam · 2024-02-05T12:10:36Z

src/db.py

@@ -67,55 +68,36 @@ def __schedule(self, agent: str, task: Any, *args: Any) -> None:

    def get_job_ids(self) -> List[JobId]:
        """Gets IDs of all jobs in the database"""
-        return [key[4:] for key in self.redis.keys("job:*")]
+        with Session(self.engine) as session:
+            jobs = session.exec(select(Job)).all()


We probably could optimize this by selecting just the required row. But if there's not that many jobs and no automatic table joins it's probably not worth the effort at this moment?

Do you mean just the required collumn? Yeah, I've thought about this but when I wrote it I only wanted to have it working and go back to it later.

It's not worth refactoring IMO, because this function is a wrong abstraction. It's used in some places, for example like this:

for job in db.get_job_ids(): job.get_job()

I didn't change it right away to avoid changes outside of the db.py but the database abstraction need to be changed significantly later.

nazywam · 2024-02-05T12:14:36Z

src/db.py

-            {"status": "cancelled", "finished": int(time())},
-        )
+    def cancel_job(self, job: JobId, error=None) -> None:
+        """Sets the job status to cancelled, with optional error message"""


What happens to the error message though?

nice (refactoring gone wrong)

nazywam · 2024-02-05T12:17:37Z

src/db.py


-    def get_job(self, job: JobId) -> JobSchema:
+    def get_job(self, job: JobId) -> Job:


Seems like the correct typing should be Optional[Job] most likely?

And then we'd want to use one_or_none().

Unless sqlalchemy.orm.exc.NoResultFound is handled somewhere down the line

It's not handled, but this should raise an exception if the job with a specified ID doesn't exist. Does it make sense? 🤔 It's used mostly in context where the UID is known to be good, or when it's user input and if it's bad it should fail

nazywam · 2024-02-05T12:18:48Z

src/db.py

-        self.redis.hmset(f"job:{job}", {"status": "removed"})
+        with Session(self.engine) as session:
+            session.execute(
+                update(Job).where(Job.id == job).values(status="removed")


Some kind of enum/consts for the query statuses at some point probably?

Yeah I have it in the TODO, saving it for a refactor 🙏 (I'll need to check how to do this in sqlalchemy but I'll check other projects I guess)

Created an issue out of it: #370

nazywam · 2024-02-05T12:28:33Z

src/db.py

-            "taints": json.dumps(taints),
-        }
+        with Session(self.engine) as session:
+            obj = Job(


We could theoretically provide the defaults in the model declaration, but since this is the only initialization at this moment it's probably not worth it at the moment?

Good question 🤔 depends on which one is clearer. Having defaults in the model for things like files_in_progress makes sense. When have a proper database I consider splitting this into several smaller objects (to avoid one huge job object that counts everything).

Co-authored-by: Michał Praszmo <[email protected]>

msm-code changed the title ~~Feature/postgres job~~ Move Job objects to postgres Feb 4, 2024

msm-cert requested a review from nazywam February 5, 2024 12:03

nazywam approved these changes Feb 5, 2024

View reviewed changes

msm-cert mentioned this pull request Feb 5, 2024

Create proper enums for Job statuses #370

Closed

msm-code and others added 15 commits February 5, 2024 17:26

Move jobs to postgres

d18dc02

Clean up the race conditions

3f5afeb

Work on mypy

29e1914

Fix return type

4eb6db7

Fix return type

5a079bf

fix docs and old schemes

7e88b58

update

102d28c

update

7c0f394

restore .env

bc5dce2

restore .env

3b48177

Apply suggestions from code review

20a327f

Co-authored-by: Michał Praszmo <[email protected]>

Add a proper view model for Job

dba596a

Add a missing error message

0d87db0

Black

5806234

Use the correct schema

eae7f24

msm-code force-pushed the feature/postgres-job branch from f5b724b to eae7f24 Compare February 5, 2024 16:30

msm-cert merged commit 9d06fe5 into CERT-Polska:master Feb 5, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move Job objects to postgres #368

Move Job objects to postgres #368

msm-code commented Feb 4, 2024

nazywam Feb 5, 2024 •

edited

Loading

nazywam Feb 5, 2024

msm-cert Feb 5, 2024 •

edited

Loading

nazywam Feb 5, 2024

msm-cert Feb 5, 2024

nazywam Feb 5, 2024

msm-cert Feb 5, 2024

nazywam Feb 5, 2024

msm-cert Feb 5, 2024

nazywam Feb 5, 2024

msm-cert Feb 5, 2024

nazywam Feb 5, 2024

msm-cert Feb 5, 2024

msm-cert Feb 5, 2024

nazywam Feb 5, 2024

msm-cert Feb 5, 2024


		def get_job(self, job: JobId) -> JobSchema:
		def get_job(self, job: JobId) -> Job:

Move Job objects to postgres #368

Move Job objects to postgres #368

Conversation

msm-code commented Feb 4, 2024

nazywam Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msm-cert Feb 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nazywam Feb 5, 2024 •

edited

Loading

msm-cert Feb 5, 2024 •

edited

Loading